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Abstract: A new method for estimating the angu- 
lar power spectrum Ci from cosmic microwave background 
(CMB) maps is presented, which has the following desirable 
properties: 

(1) It is unbeatable in the sense that no other method can 
measure Ci with smaller error bars. 

(2) It is quadratic, which makes the statistical properties of 
the measurements easy to compute and use for estimation of 
cosmological parameters. 

(3) It is computationally faster than rival high-precision meth- 
ods such as the nonlinear maximum-likelihood technique, 
with the crucial steps scaling as rather than n"^, where 
n is the number of map pixels. 

(4) It is applicable to any survey geometry whatsoever, with 
arbitrary regions masked out and arbitrary noise behavior. 

(5) It is not a "black-box" method, but quite simple to un- 
derstand intuitively: it corresponds to a high-pass filtering 
and edge softening of the original map followed by a straight 
expansion in truncated spherical-harmonics. 

It is argued that this method is computationally feasible 
even for future high-resolution CMB experiments with n ~ 



10" 



10 . It is shown that Ci computed with this method 



is useful not merely for graphical presentation purposes, but 
also as an intermediate (and arguably necessary) step in the 
data analysis pipeline, reducing the data set to a more man- 
ageable size before the final step of constraining Gaussian 
cosmological models and parameters — while retaining all 
the cosmological information that was present in the original 
map. 



I. INTRODUCTION 

The angular power spectrum Ci of the fluctuations in 
the cosmic microwave background (CMB) is a gold mine 
of cosmological information. Since it depends on vir- 
tually all classical cosmological parameters (the Hubble 
parameter ft,, the density parameter fi, the cosmological 
constant A, etc.), an accurate measurement of Ci would 
amount to an accurate measurement of most of these pa- 
rameters In the last few years, the angular power 
spectrum has emerged as the standard way of present- 
ing experimental results in the literature, replacing other 
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fluctuation measures such as the correlation function and 
the Gaussian autocorrelation function amplitude (as de- 
scribed in e.g. Q). There are several reasons for this: 

1. The Boltzmann equation is diagonal in the Fourier 
(multipole) domain rather than in real space, so 
the features of the power spectrum can be given a 
direct and intuitive physical interpretation (see e.g. 

i)- 

2. A plot of power-spectrum estimates allows experi- 
ments to be compared in a model-independent way, 
as opposed to, say, parameter estimates and exclu- 
sion plots that are only valid within the framework 
of particular cosmological models. 

3. For Gaussian models, power spectrum estimation 
constitutes a useful (and arguable necessary) data 
compression trick for making the analysis of future 
megapixel sky maps feasible in practice. 

In what follows, we will pay considerable attention to 
the the third point, since there are at present no unbeat- 
able methods available that are computationally feasible 
when n, the number of map pixels, is very large. The 
CPU time needed for applying the maximum-likelihood 
method directly to a map scales as n^, since it involves 
computing determinants of n x n (non-sparse) covari- 
ance matrices, and the Karhunen-Loeve data compres- 
sion method (see and references therein) unfortu- 
nately requires the diagonalization of an n x n matrix, 
which also scales as . Such a brute-force approach has 
so far only been implemented up to n ~ 4000 p-|ll[ , and 
it currently appears unfeasible to push it much beyond 
n — 10'*. In contrast, the upcoming satellite missions 
MAP and Planck will have n in the range 10^ — 10^. The 
data-compression aspect of power spectrum estimation 
is illustrated in Figure 1: if the power spectrum retains 
(in a distilled form) all the cosmological information that 
was present in the map, then the computationally unfea- 
sible step of estimating the parameters directly from the 
map can be split in to two feasible steps, giving exactly 
the same answer and and the same error bars. This is 
completely analogous to the way in which map-making 
is an intermediate step, and as was recently shown [ p^ , 
there are indeed map-making methods that destroy no 
cosmological information at all. 

In this paper, we will derive a new method for esti- 
mating Ci from maps that has the following desirable 
properties: 
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1. It is unbeatable in the sense that no other method 
can give smaller error bars on Ce or on any cosmo- 
logical parameters upon which Ci depends. 

2. It is quadratic, which means that the statistical 
properties of the estimates are easy to compute. 

3. It is faster than the maximum-likelihood method 
and the eigenmode method with the required 
CPU time for the crucial steps scaling as rather 
than . 

4. It is transparent and easy to understand intuitively. 

The rest of this paper is organized as follows. In Section 
2, we discuss in more detail how to assess the merits of 
a power spectrum estimation method, and derive a sim- 
ple test for determining whether it is unbeatable in the 
above sense. In Section 3, we derive the new method and 
prove that it is in fact unbeatable in the sense of Section 
2. In Section 4, we explore its properties, illustrated with 
an application to the 4 year COBE/DMR data. In sec- 
tion 5, we discuss how to use this method in the analysis 
of a future megapixel map, both for graphically present- 
ing the data and for measuring cosmological parameters. 
Finally, we summarize our conclusions in Section 6. 

II. HOW TO ASSESS HOW GOOD A METHOD IS 

Above we listed three uses for power spectrum esti- 
mation methods. Especially for the third use, as a data 
compression technique, we clearly want a method to have 
the following properties: 

1. It should be computationally feasible in practice. 

2. It should produce estimates of Cf whose statisti- 
cal properties are well enough understood to make 
them useful for parameter estimation and model 
testing. 

3. It should destroy as little information as possible. 

We will now elaborate on the third of these criteria, and 
return to the other two further on. 



A. The notion of a lossless method 

The Fisher information matrix formalism (see for a 
comprehensive review) offers a simple and a useful way 
of diagnosing the methods corresponding to the various 
boxes in Figure 1, to measure how much information they 
destroy. Given any set of cosmological parameters of in- 
terest {h, n, etc.), their Fisher matrix F gives the small- 
est error bars with which the parameters can possibly be 
measured from a given data set. can, crudely speak- 
ing, be thought of as the best possible covariance matrix 



for the measurement errors on the parameters. For in- 
stance, the Cramer-Rao inequality shows that no unbi- 
ased method whatsoever can measure the i*'' parameter 
with error bars (standard deviation) less than 1 / ^/Fa. If 
the other parameters are not known but estimated from 
the data as well, the minimum standard deviation rises 
to (F-i),f . 

By computing the Fisher matrix separately from each 
of the intermediate data sets in Figure 1, we can thus 
track the flow of information down the data pipeline and 
check for leaks. For instance, if the Fisher matrix com- 
puted from the raw time-ordered data (TOD) is identical 
to that computed from the map, then the map-making 
method (denoted W in the figure) is lossless in the sense 
that no information about these parameters has been lost 
in the map-making process. It was recently shown p^ ] 
that some of the popular map-making methods from the 
literature are lossless whereas others are not. The ad- 
vantage of making a lossless map is that this reduces 
the data set to a more manageable size before the more 
complicated nonlinear data analysis step (the final likeli- 
hood analysis) . We will see that the angular power spec- 
trum plays quite an analogous role, allowing us to subject 
the map to a second data compression step before com- 
mencing the final parameter estimation step, and we can 
clearly diagnose it in exactly the same way. Let us make 
the following definition, which is applicable to any data 
compression method whatsoever (to any procedure that 
reduces a larger data set into a smaller one): 

• A data compression method is said to be lossless if 
any set of cosmological parameters can be measured 
just as accurately from the compressed data set as 
from the original data set. 

B. Lossless or not? A simple test 

Unfortunately, this definition is not particularly use- 
ful for diagnosing a method in practice, since it involves 
computing the Fisher matrices for a large or infinite num- 
ber of parameter sets. Fortunately, this is equivalent to 
a much simpler test, as we will now show. 

If the probability distribution / for the data set x (the 
pixels temperatures in a sky map) depends on some pa- 
rameters Ai,A2,..., then the Fisher information matrix 
for these parameters is defined as 

Since / is a probability distribu- 
tion over X, / f{x;Xi,X2,...)d^x = 1 for any choice of 
the parameter vector {A^}. Differentiating this identity, 
we obtain 
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Using this result and the chain rule, we find that if the 
parameter set {Ai} depends on some other parameter set 
{9i}, then the Fisher matrix for these new parameters is 
given by 



where the Jacobian matrix 



(3) 



(4) 



Note that this simple transformation rule holds regard- 
less of whether the probability distribution is Gaussian 
or not. 

If the CMB fluctuations x are Gaussian and isotropic, 
then we know that their probability distribution is en- 
tirely determined by the power spectrum. This means 
that if we choose the parameters A; to be the power spec- 
trum coefficients C^, the Fisher matrix for any cosmo- 
logical parameters whatsoever can be computed directly 
from F*^ , the Fisher matrix for the power spectrum itself: 



where 



J*F'^J. 



30^ ■ 



(5) 



(6) 



In other words, there is no need to compute and compare 
large numbers of Fisher matrices for various parameter 
combinations, since they can all be computed directly 
from F*^. Here and throughout, we will let d denote 
estimates of the true angular power spectrum Cf, so the 
estimates are unbiased if they satisfy 



{Ci) = Ci. 



(7) 



With this notation, we can summarize this section as 
follows. To test if a power spectrum estimation method 
is lossless, we simply compute the covariance matrix 



(8) 



and check if it equals the inverse of the Fisher matrix F*-^ 
of equation ( [TTI ) below. 



C. The power spectrum Fisher matrix 

Let us now evaluate this important matrix F*^. Using 
the addition theorem for spherical harmonics gives the 
well-known correlation function formula 



C= (xx*> =N + ^P^Q 



(9) 



where N denotes the noise covariance matrix and the 
matrices are defined as 



2£+ 1 
An 



(10) 



Here the Pg denote Legendre polynomials and is a 
unit vector pointing in the direction of pixel i. Thus 
dC/dCg = P^, and equation (15) of ^ (which gives the 
Fisher matrix for a general multivariate Gaussian prob- 
ability distribution) yields 



dCi dC' 



= itrC-ip^C-ip^ 
2 



(11) 



Are there any lossless methods? Below we will answer 
this question affirmatively. 



III. THE OPTIMAL METHOD 

In this section, we will derive the above-mentioned loss- 
less method. 



A. A first guess: the ML-method 

In many cases (including some mapmaking algorithms 
|l2|), the maximum- likelihood (ML) method turns out to 
be lossless, so one might guess that this would be the case 
here as well. Indeed, this approach to power-spectrum 
estimation has been applied to the 4 year COBE DMR 
data 1 11 



. ij]. Unfortunately, the ML-estimates Ci turn 
out to depend on the data set x in a highly nonlinear way, 
which gives the ML-estimates two undesirable properties: 

1. They must be found by numerically solving a 
system of nonlinear equations, which is time- 
consuming. 

2. The probability distributions for these estimates 
are virtually hopeless to compute analytically, 
which makes it difficult to use the ML-power spec- 
trum estimates in the last step of the data pipeline, 
in a likelihood analysis to determine cosmological 
parameters. 

For these reasons, it would be a pleasant surprise if the 
ML-method turned out not to be lossless, but inferior to 
some simpler power spectrum estimation technique. 



B. A second guess: quadratic methods 

Fortunately, as we will see below, there are indeed con- 
siderably simpler estimates of the power spectrum that 
are lossless — specifically, quadratic ones. By a quadratic 
estimator, we mean one that is a quadratic function of 
the pixels, taking the form 



Cf = x*E^x - be 



(12) 
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for some symmetric matrix and some constant bg. 
Before embarking on detailed calculations, let us give a 
more intuitive argument for why we might expect the 
best method to be quadratic. It is easy to see that the 
entire data set x can be recovered from the set of all 
pair products XiXj, apart from an uninteresting overall 
sign ambiguity: given the matrix xx', we simply com- 
pute Xi = ±(xx*)^/^ and then fix all signs except one 
using the off-diagonal terms. Since the overall sign is ir- 
relevant, the data set consisting of the 71^ numbers in xx' 
therefore contains all the cosmological information that 
the n numbers in x did. Our quadratic power estimator 
Ci = ^j^- E.y(xx*).y — bi is simply a linear function of 
these pair products. Equation (0) is telling us that 



[XiXj 



(13) 



will follow his notation and let a Greek index denote a 
pair of Latin indices, with a = and /? — (fc, I). With 
this notation, and change from n x n matrices to 
n^-dimensional vectors (or, since they are symmetric, to 
n{n + l)/2-dimensional vectors if we restrict ourselves to 
counting each pixel pair only once — say, j > i). Defining 
the matrix 



= [CikCji + CilCjk] 



(18) 



our 



problem reduces to simply minimizing J2af3 '^ap'E^a^p 
subject to the constraint J^a Pi Eg = 1. Introducing a 



Lagrange multiplier just as in |15| , we find the solution 
to be 



(19) 



i.e., that these pair products are on average just linear 
combinations of the coefficients Ci that we want to mea- 
sure, so by analogy with the mapmaking results of 
we might guess that since the problem is linear, the best 
solution should be linear, so that there exists an estima- 
tor of the form of equation ( p^ ) that is lossless. 

Encouraged by this, we will now derive the the best 
method in the quadratic family. After that, we will give 
a proof showing that this method is lossless, i.e., that 
no other (more nonlinear) method can possibly do any 
better. 



C. The best quadratic method... 

Let us now find the quadratic power spectrum esti- 
mators that give the smallest error bars. Substituting 
equation (fih into equation (12) and chosing 



be = tr NE*^ 
to make the estimate unbiased, we obtain 

where the Window function is given by 
Wff, = trP'^'El 



(14) 



(15) 



(16) 



Let us find the estimate of Ce with minimal variance 
subject to the normalization constraint that W^^ = 1. 
Since we are assuming Gaussianity, the covariance matrix 
of equation (@) is given by 



ijkl 



(17) 



so we want to find the E^ that minimizes Vu subject 
to this constraint. The analogous problem for Galaxy 
surveys was recently solved by Hamilton |ia], and we 



D. ...is in fact both simple... 

Unfortunately, equation ( [l9| ) is not a very useful result 
for our application, since the matrix that needs to be in- 
verted is enormous, with dimensions [n(n+l)/2] x [n{n + 
l)/2] when eliminating the redundant rows and columns 
corresponding to double-counted pixel pairs. For this rea- 
son, Hamilton proceeds |16| to provide an approximate 
method for solving this equation by means of a perturba- 
tion series expansion. Fortunately, the giant matrix M^^ 
can be rewritten in a much simpler form using some al- 
gebraic tricks.]] To show this, let us make an alternative 
derivation of the optimal matrix E^. Since both E^ and 
C are symmetric, we can rewrite equation ( |l7|) as 



Vw = 2tr [CE^CE^ ], 



(20) 



so we simply want to minimize tr CE^CE^ subject to 
tr P^E^ = 1. Introducing a Lagrange multiplier A, we 
wish to minimize the function 



L = tr [CE^'CE^ - 2A(P''E^ - 1)] 



(21) 



Requiring the derivatives with respect to the components 
of E^ to vanish, we obtain 



CE*C = XP\ 



(22) 



so substituting the A that gives tr P^E^ = 1 leaves us 
with the simple solution 



^ Note that for the non-Gaussian case, which is relevant e.g. 
for non-linear clustering in Galaxy surveys, this trick does 
not work, in which case the above-mentioned perturbation 
series expansion of Hamilton is the only approach presently 
available. 
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2F 



-c-^p^c 



(23) 



where F is the Fisher information matrix given by equa- 
tion ([n]), I.e., Fee = tr [C-ip^C-ipV2. Comparing 
this with equation (p^, we see that equation (|2^) has 
the advantage that only a much smaller matrix (n x n) 
needs to be inverted. 



E. ...and lossless 

We will discuss the properties of this power spectrum 
estimation method at some length in Section as well 
as apply it to the COBE data. Before doing this, how- 
ever, we will now prove that this method is lossless in 
the sense defined above. (Our derivation of the method 
merely guaranteed that it was the best quadratic method, 
but did not rule out the possibility that it destroys infor- 
mation and is inferior to some more nonlinear technique.) 

The way we chose to normalize does clearly not af- 
fect the error bars with which we can determine cosmo- 
logical parameters, since multiplying the estimated power 
spectrum coefficients by some constants (or indeed by 
any invertible matrix) will not change their information 
content (see 0). To simplify the calculation below, let us 
therefore scrap the F^^-factor in equation ( |2^ ) and define 
rescaled power spectrum coefficients 



where 



x*E^x, 



W = -C^P^C 



(24) 



(25) 



Substituting equation ( |25| ) into equation (pO|), we now 
find that the covariance matrix for y reduces to simply 



(yy*)-(y)(y)* = F, 



(26) 



the Fisher matrix. Arranging the true power spectrum 
coefficients Ce into a vector c, equation (15) takes the 
simple form 



(y) = Fc. 



(27) 



In other words, the window function matrix W of equa- 
tion ([l6| ) is also equal to the ubiquitous Fisher matrix. 
This means that if we use the vector 



c = F ly 



(28) 



to estimate the power spectrum c, then this estimator 
will have the nice property that it is unbiased: 



c = c. 



(29) 



This Fisher-Cramer-Rao inequality (see for a review) 
tells us that the best an unbiased estimator can possi- 
bly do (in terms of giving small error bars) is for its 



covariance matrix to equal F"-'^, the inverse of the Fisher 
matrix. Using equation ( p6|) and equation (p8|), we find 
that this covariance matrix is precisely 



((c-c)(c-c)*)=F-i[(yy*) 



(y)(y)*]F-i=F-\ 
(30) 



so c is indeed optimal in this sense. In other words, 
we have found the best unbiased estimator of the power 
spectrum, the one which gives the smallest error bars 
allowed by the Fisher-Cramer- Rao inequality |^ The fact 
that it turned out to be a simple quadratic estimator 
is good news for CMB data analysis, since this means 
that it is much simpler to implement in practice than for 
instance the highly nonlinear ML-method. 



IV. A WORKED EXAMPLE: THE COBE DATA 

In this section, we will discuss various aspects of how 
the method works. To prevent the discussion from be- 
coming overly dry and abstract, we will illustrate it with 
a worked example: application of the method to the 4 
year COBE/DMR data. 



A. The COBE power spectrum 

We combine the 53 and 90 GHz channels (A and B) of 
the COBE DMR 4 year data into a single sky map 
by the standard minimum-variance weighting, pixel by 
pixel. We use the data set that was pixelized in galactic 
coordinates. After excising the region near the galactic 
plane with the "custom cut" of the COBE/DMR team 
||l7| , n = 3881 pixels remain. As has become standard, 
we make no attempts to subtract galactic contamination 
outside this cut. We remove the monopole and dipole 
and include this effect as described in the Appendix. 

The resulting power spectrum is shown in Figure |^ 
(top) , and is very similar to that extracted with the eigen- 
mode method [M — we will discuss the relation between 



When a large fraction of the sky is covered, C'e is effectively 
the average of {2£ + 1) independent multipoles, so for £ > 50, 
it will have a virtually perfect Gaussian distribution by the 
central limit theorem. This means that the post-compression 
Fisher matrix is almost exactly F, the value computed from 
the map, so that the power spectrum estimates retain all the 
small-scale cosmological information. For the very lowest mul- 
tipoles £ ^ 4:, the Gaussian approximation becomes poor, so 
that it might be desirable to supplement the power spectrum 
estimates with some linear measures of the large-scale power 
(flfm-coefficients, say, as suggested in to ensure that every 
bit of information is retained. This may be desirable anyway, 
to avoid non-Gaussianity in the like lihood calculation, as will 
be further discussed in Section V B 
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various methods below. A brute force likelihood analysis 
of the 4 year data set |jl^ gives a best fit normalization 
of Qrms,ps = 18.4 jjK for pure Sachs- Wolfe n = 1 model, 
corresponding to the heavy horizontal line in the figure, 
and we used this as the fiducial power spectrum when 
computing C. If this model were correct, we would ex- 
pect approximately 68% of the data points to fall within 
the shaded 1 — a error region. As can be seen, the height 
of this region (the size of the vertical error bars) is domi- 
nated by cosmic variance for low i and by noise for large 



B. The window functions 

Our COBE example illustrates a number of general 
features regarding how the window function depends on 
the target multipole and on the sky coverage, as shown in 
Figures 3 and 4. However, since we are conforming to the 
customary way of plotting window functions here, rather 
than to the definition of equation ([l^), a few clarifying 
words regarding precisely what is plotted are in order 
before proceeding. 



1. What they mean 

As has become standard, the vertical axis in Figure | 
shows not Ci but the relatively flat quantity 



(31) 



so we want to interpret the data points as weighted aver- 
ages of these quantities (rather than as weighted averages 
of the -coefficients) , with the window function giving 
the weights. In addition, we must take into account the 
fact that the COBE beam smearing suppresses the true 
multipoles Ct by the known factors Bj given by |19|. We 
thus rewrite equation (O) as 



C = 



= N 



where we have defined P = /i^P^ and 



(32) 



(33) 



^(^-f l)/27r' 

and find the Fisher matrix for the ZJ^-coefficients to be 

F«' = //£^£'F«'. (34) 

Since each window function by definition must add up 
to unity, the correct normalization for the band-power 
estimators of Dp is 



Dp = Npjipyp 



^x*C-ip,C-ix, 



(35) 



Np 



This implies that the mean is 

{Dp)^Y.^U'Dp,, 



where the window function 



(36) 



(37) 



(38) 



satisfies 'W ppi = 1, and the covariance is 

{DpDp,) - {Dp) {Dp,) - A^,A^^.r«,. (39) 

For I — P ^ this expression gives the (squared) error bars 
plotted on the data points Dp in Figure ||. The horizon- 
tal point locations and the corresponding horizontal bars 
give the means and r.m.s. widths of the window functions 
W«'. 

2. How they depend on the target multipole 

Figure || shows the window functions for estimating 
the multipoles l^, =10, 15 and 20. As can be seen, their 
shape and width is more or less the same, so increasing 
the target multipole merely translates them in ^-space. 
This quantitative result is easy to understand in terms of 
the quantum mechanics analogy made in [p^ : if the wave 
function of a quantum particle on a sphere is required to 
vanish in certain regions, then its angular momentum 
distribution (spherical harmonic coefficients) must have 
a certain minimum width which is independent of the 
average angular momentum (in our case, independent of 
the target multipole £*). 

3. How they depend on the sky coverage 

The Heisenberg dispersion formula ( "uncertainty rela- 
tionship") tells us that this minimmn width is of order 



1/A0, 



(40) 



where the normalization constants are defined by 



where A9 is the angular size in radians of the small- 
est dimension of the sky patch. This simple scaling is 
quantitatively illustrated in Figure |^. For instance, com- 
paring the two middle panels shows that adding a sec- 
ond hemisphere does not reduce the width of the win- 
dow (since A9 remains unchanged), but merely removes 
the power leakage from the even multipoles (since the 
galaxy cut is approximately symmetric under reflection, 
and even and odd multipoles remain roughly orthogonal, 
since they have opposite parity). 
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C. The essence of the method 

How docs our method work? In this section, we wiU 
see that it is quite straightforward to acquire an intu- 
itive understanding what the method does with the data 
and why this improves the situation. First we note that 
equation (^5|) can be rewritten as 



D, cx z'P^z, 



where the vector z is defined as 



(41) 



z = C-^x. (42) 

Since z also consists of n numbers, we can plot it as 
a sky map, as is done in Figure |^. Moreover, by the 
addition theorem for spherical harmonics, we can factor 
the matrix Pg as 

P' = Y,Y*, (43) 
where the n x [2i + l)-dimensional matrix is defined 



YL=F,„(f,). 



(44) 



Here and throughout, we let Y^m denote the real-valued 
spherical harmonics, which are obtained from the stan- 
dard spherical harmonics by replacing e"""^ by \/2 sin mi/), 
1, a/2 cos for m<0, m — 0, m>0 respectively. 
Combining the last four equations, we find that 



De cx 



(2e + i) ^ 



Ylm(ji)Zi 



(45) 



which we recognize as the method of expansion in trun- 
cated spherical harmonics p0[-p2| , but applied to the map 
z instead of the map x. 

Figure |^ compares the maps x and z, and visual in- 
spection reveals that although the small-scale features of 
X remain visible in z, there are two obvious differences: 

1. z looks high-pass filtered, with large-scale fluctua- 
tions rendered almost invisible. 

2. The edges are softened by downweighting the pixels 
near the galaxy cut, notably in the high signal-to- 
noise case. 

As described below, both of these phenomena have a sim- 
ple intuitive explanation. 



1. High-pass filtering 



Since (xx*) = C, it follows that 



(zz*> = C-i(xx*)C-i = C-^ 



(46) 



In other words, those modes which had the most power 
in the original map x have the least power in the map 
z and vice versa. Computing Dg from z with the sim- 
ple spherical-harmonic method is therefore akin to the 
standard way of estimating a mean with inverse- variance 
weighting: first we re-weight the numbers by dividing 
them by their variance, then we perform a straight aver- 
age on the result. Figure || illustrates the effect of this 
procedure in the Fourier (multipole) domain, on the win- 
dow functions. The top panel shows the result of apply- 
ing the straight spherical-harmonic method directly to x. 
The reason that the results are so poor is that the power 
spectrum Cg falls rapidly with i, so that even though the 
"red leak" from lower multipoles is small on geometric 
grounds, the amount of large-scale power being aliased 
into the estimates of high multipoles is nonetheless com- 
parable to the weak small-scale signal that we are trying 
to measure. The middle panel shows the optimal method, 
i.e., applying the straight spherical- harmonic method to 
z. Since the power spectrum of z is "tilted" to suppress 
the large-scale power, the troublesome red leak is seen to 
be virtually eliminated: it no longer matters that some 
fraction of the large-scale power in z gets aliased down 
to small scales, since there is is so little large-scale power 
there in the first place. 

The bottom panel shows the window function that 
would result if the signal-to-noise ratio was about 500 
times higher, roughly corresponding to what is expected 
for the upcoming Planck satellite. Here the high-pass fil- 
tering is seen to be more extreme, and the window func- 
tion is seen to be slightly narrower still. If the pixel noise 
is uniform and uncorrelated, then C will clearly become 
proportional to the identity matrix if we let the signal-to- 
noise ratio approach zero. This means that if the noise 
in a map is orders of magnitude greater than the signal, 
then z cx X and the best pixel weighting becomes to do 
nothing at all, leaving the map as it is. In this sense, 
the three maps in Figure ^ form a progression of z-maps 
corresponding to increasing signal-to-noise. 



2. Edge tapering 

The second salient feature of the method, "feather- 
ing" the edges, is also easy to understand in terms of 
the quantum mechanics analogy given in [^3|. For the 
window function to be narrow, we loosely speaking want 
the pixel weighting to be narrow in the Fourier (multi- 
pole) domain, so to avoid excessive "ringing" in Fourier 
space, the weighting in real space should be continuous 
and smooth. This standard signal-processing procedure 
is also known as apodizing, and is routinely used in the 
analysis of one-dimensional time series data ||2^]. An al- 
ternative way to see why the edges are softened is to 
consider the noise map, i.e., the map of the standard 
deviations of the pixels in z. This is simply the square 
root of the diagonal part of its covariance matrix, i.e.. 
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A^i = (C~^)]^/^. As is readily verified even for a simple 
one-dimensional array of pixels, the diagonal elements of 
C^^ will be smaller near the edges even though all diago- 
nal elements of C itself are the same (since Cy = c(ri -rj) 
for some correlation function c when the noise is uni- 
form). 

Just as with the high-pass filtering, the degree of edge 
tapering is seen to increase with the signal-to-noise ratio 
S/N. As mentioned, S/N ^ gives C cx I, i.e., no edge 
softening at all, whereas the noise map tends to zero at 
the edges when S/N oo. In this sense, our new method 
strikes a balance between the all-out apodization of the 
eigenmode method and the laissez-faire approach of 
the Hauser- Peebles method , with the amount of soft- 
ening depending on what is affordable given the noise. 
Clearly, if S/N ~ 0, the sole source of variance in 
is the noise, so the widths of the window functions (and 
hence the cosmic variance leakage from unwanted aliased 
multipoles) is irrelevant, and we simply wish to weight 
all pixels equally (or by the inverse of their noise vari- 
ance if the noise is non- uniform) . If there is very little 
noise, on the other hand, we can go to great lengths to 
narrow down the window functions. Indeed, the excellent 
S/N of Planck drives the algorithm to "deconvolve" the 
power spectrum to obtain an even narrower window in 
the bottom panel of Figure ^ than in the middle one. 



D. Speed issues 

A quick glance at equation (^5|) might give the impres- 
sion that inversion and multiplication of n x n matrices 
are required to compute the estimated power spectrum 
Di, which are devastatingly slow operations. Fortu- 
nately, this impression is deceptive, since equation ( ^ ) 
shows that merely a matrix- vector product (ordo ri^), a 
vector-vector product (ordo n) and a linear equation so- 
lution stemming from equation ( j^) need be carried out 
to obtain the raw power estimates. The latter is also an 
ordo operation when using an iterative approach, e.g., 
the Gauss-Seidel method. Moreover, there is no need to 
store the n x n elements of or C, since they can be 
rapidly computed on the fiy, as needed, for instance with 
cubic spline interpolation, so the storage requirements 
here are merely ordo n. 

Similar manipulations enable great time savings when 
computing the Fisher matrix, which as we saw gave both 
the window functions and the covariance matrix of the 
power estimates by simple rescalings of its rows and 
columns. Suppose we are interested in the power spec- 
trum up to some monopole imax- There are {Imax + 1)^ 
spherical harmonics with £ < imax, so let us define the 
n X {imax + l)^-dimensional matrix Y just as in as 



equation (^) and the fact that a trace of a product of 
matrices is invariant under cyclic permutations, we ob- 
tain the useful result that 



YiA = Yim{ri). 



(47) 



m—~£ m' — - 



where the matrix V is defined as 



Vee Y*C-iY. 



(48) 



(49) 



Since C~^Y can be solved iteratively for each spheri- 
cal harmonic (row of Y) separately, and since the struc- 
ture of equation ( 48 ) shows that there is no need to ever 
load the entire V-matrix into memory all at once, the 
computation of F thus poses no significant computer 
memory challenges and lends itself well to paralleliza- 
tion. The computation of the the noise bias corrections 
bi are straightforward to accelerate in an analogous way. 
Note that equation (^ ) also proves that all elements of 
the Fisher matrix are non-negative, which among other 
things means that the optimal method will never give 
window functions that go negative. 

Alternatively, if an approximation of F is deemed sat- 
isfactory, it can of course be estimated quite rapidly by 
computing Di from a large number of Monte-Carlo skies. 

Finally, it should be noted that the calculation of 
the parameter covariance matrix takes roughly the same 
amount of time for this method as it does for the Hauser- 
Peebles method. For that case, the covariance matrix for 
the power spectrum estimates can be rewritten in the 
form of the right hand side of equation (^8|) but with 
V = Y*CY instead of Y'C-^Y Since matrix multi- 
plication takes roughly as long as matrix inversion, this 
shows that although the simple approach of estimating 
the power spectrum with a straight truncated spheri- 
cal harmonic expansion of the original map is inferior 
in terms of destroying information, it is not substantially 
faster. 



V. FURTHER DOWN THE PIPELINE: WHAT TO 
DO WITH de 

In this section, we will briefly discuss that part of the 
data analysis pipeline in Figure 1 which lies beneath the 
power spectrum estimation step. As discussed in the in- 
troduction, there are many reasons to plot the power 
spectrum for direct visual inspection. The power spec- 
trum estimates also constitutes a small and manageable 
data set that retains all the cosmological information 
from the original map in Gaussian models, and is there- 
fore useful as a starting point when constraining cosmo- 
logical models and model parameters. We will now dis- 
cuss these two applications in turn. 



Here we have combined i and m into a single index 
A = i'^ + i + m + 1 = 1,2,3,... Using equation (lu]). 
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A. Using C for "chi-by-eye" 

Given the vector of raw power spectrum estimates y, 
where yg = ixgyg, ^ = 2, ...,lraax, and y^ is defined as in 



Section HIE, there are a number of ways to take hnear 
combinations of them, normahze them and plot them 
with vertical and horizontal error bars as in Figure |^. We 
will now discuss some natural ones briefly and comment 
on their relative merits. 



1. Raw estimates 

The minimalistic approach is of course to do nothing 
and simply plot y as is. This is the simple approach 
taken in the top panel of Figure ^j. 



any choice of U, all the new data points will be uncorre- 
lated with unit variance (they are of course appropriately 
rescaled when plotted), and the new window function ma- 
trix will be simply X* . 



5. Principal components 

One special case of the above approach is to choose 
U = I, the identity matrix, which reduces it to a so- 
called principal component analysis. By sorting the 
new data points (the principal components) by their 
eigenvalues, one can rank them from best to worst and 
throw away a substantial number of essentially redundant 
ones, thereby getting around the problem that there are, 
loosely speaking, too many data points for them to all be 
uncorrelated and yet not in some sense pathological. 



2. Band power 

One disadvantage of the previous approach is the pro- 
fusion of data points, sampling the power spectrum with 
a much narrower ^-spacing (in intervals A£ = 1) than the 
scale on which typical models are expected to vary notice- 
ably (typically M > 10, at least for ^ > 50). A simple 
remedy is of course to average the raw power estimates 
in appropriate bands, essentially smoothing Figure ^. 



3. Deconvolved power 

The exact opposite approach is to "un-smooth" or de- 
convolve the power spectrum so that all data points be- 
come uncorrelated and all window functions become Kro- 
necker delta functions. Although this can be forma lly ac- 
complished by computing F^^y as in Section III E , where 



we did this to prove that our method had retained all 
the information that there was, this is of course a terri- 
ble idea in practice, since incomplete sky coverage makes 
the Fisher matrix nearly singular. The result would be 
a plot with gigantic error bars, which is simply Nature's 
way of telling us that we cannot really measure the power 
spectrum with a resolution below the natural scale set by 
l/M. 



4-. Uncorrelated data points 

A more fruitful way of producing uncorrelated data 
points is to plot the numbers in the vector X*y, where 
the rows of the matrix X are the solution vectors to the 
generalized eigenvalue problem 



UX = FXA 



(50) 



6. Hamilton coefficients 

In two recent papers p5| , p^ , an extensive study of 
choices of U was carried out for the related problem of 
how to present the power spectrum measured from galaxy 
surveys. It was found that most choices, including that 
of principal component analysis, are unfortunately not 
particularly useful, since they tend to produce quantities 
whose window functions are "unphysical" in the sense 
of being extremely broad and often negative. However, 
it was found that a certain limiting case ||l^ tends to 
produce nice and clean window functions, and in addi- 
tion eliminates the need to solve an eigenvalue problem. 
These cases correspond to factoring the Fisher matrix as 



F = MM* 



(51) 



for some symmetric matrix U. Here A is a diagonal ma- 
trix containing the eigenvalues. It is easy to show that for 



for some matrix M, and chosing X = M Of the 
infinitely many choices of M, three are particularly at- 
tractive H: 

1. If one requires M to be lower-triangular, in which 
case F = MM* corresponds to a Cholesky decom- 
position, the COBE case gives the narrow and non- 
negative window functions in the middle panel of 
Figure]^, with side lobes only to the right. 

2. Similarly, one could obtain window functions with 
side lobes only to the left by chosing M upper- 
triangular. 

3. A third choice, which is the personal favorite of the 
author, is choosing M symmetric, which we write 
as M = F^/^. The square root of the Fisher ma- 
trix is seen to give beautifully symmetric window 
functions (Figure 0, bottom) that are not only non- 
negative, but also even narrower than the original 
(top), which has roughly the bottom profile con- 
volved with itself. 



9 



The bottom panel of Figure I shows the COBE power 
spectrum plotted with this last method. These 29 data 
points thus contain all the cosmological information from 
COBE, distilled into 29 chunks that are not only collec- 
tively exhaustive (jointly retaining all the cosmological 
information), but mutually exclusive (uncorrelated) as 
well. The above-mentioned band averaging now has the 
nice feature that the band powers will automatically be 
uncorrelated as well, so to reduce scatter, the 29 mea- 
surements have been binned into 8 bands in Table 1 and 
Figure ^. The data and references for the other experi- 
ments plotted can be found in recent compilations pq,E6| . 



7. Negative power? 

We close this discussion by remarking that with all 
these approaches, it is possible for data points to be neg- 
ative, which may annoy certain readers since the true 
power spectrum is by definition nonnegative. It should 
be emphasized that this is a purely stylistic issue of no 
scie ntific importance whatsoever. As we proved in Sec- 
tion III E, the raw power estimates contain all the cos- 



mological information there is, regardless of whether we 
plot them or not. The total power that we measure in 
a given multipole will always be nonnegative, and the 
reason that negative values can occur in figures is simply 
that we are plotting the difference between two positive 
quantities: what we measure and the noise bias. Plotting 
the the sum of yi and the noise bias would result in fig- 
ures guaranteed to be free of negative points, clearly con- 
taining exactly the same information as those described 
above since the noise bias is a set of known constants, 
but having the undesirable property of being biased up- 
ward. Alternatively, some non-linear mapping could be 
used to guarantee positivity of the plotted points, but for 
subsequent analysis as outlined in the following section, 
we obviously want to retain our simple quadratic esti- 
mators to avoid complicating the statistical properties of 
the measured power spectrum. 



B. Using Ci for parameter estimation 

The second, and arguably most important, use for the 
power spectrum estimates is to make it possible to place 
sharp quantitative constraints on cosmological models 
and their parameters. 



C. The simple-minded approach: maximum 
likelihood 

As described in likelihood analysis has emerged as 
one of the most popular data analysis tools in cosmology 
because it is often simple to implement and in addition is 
the best method in certain asymptotic situations. In our 



case, measuring say the 11 CDM-parameters of |£j via a 
direct likelihood analysis using our power spectrum esti- 
mates Di would unfortunately be extremely cumbersome 
numerically for the case of a megapixel CMB map. The 
reason is that it would, even in the crude and poor ap- 
proximation that Z?£ is Gaussian all the way down to the 
lowest multipoles, require computing their (.max x (max 
covariance matrix at a grid of points in parameter space. 
Although this covariance matrix is relatively small, we 
saw above that its computation was so time-consuming 
that it would be preferable to compute it only once (or 
a few times). In addition, the central limit theorem only 
guarantees that the multipole estimates have Gaussian 
probability distributions for £ > 50, so the exact likeli- 
hood function will be extremely cumbersome to compute 
when the lower multipoles are included in the analysis. 



D. Why chi-squared can be just as good 



This situation is similar to that in Section III A . where 



we found that the ML-method was numerically undesir- 
able and hoped that a simpler method would provide 
error bars that were equally good or better. In that in- 
stance, a simple quadratic method came to the rescue, 
and we will argue that the story repeats itself here in 
the final step in the pipeline: that the simple (quadratic) 
chi-squared method is likely to be, if not better, at least 
almost as good as the ML-method when fitting parame- 
ters to the power spectra measured from megapixel CMB 
maps. 

The Karhunen-Loeve data compression problem has 
been generalized and solved for the case at hand here 
Q : we have a data vector (say, the raw power spectrum 
estimates y) whose mean 



m = (y) 



and covariance matrix 



M=(yyO-(y)(y* 



(52) 



(53) 



both depend in a known way on the parameters that we 
wish to estimate (say /i, fi, etc.). When changing the 
parameters, the likelihood function of course changes for 
two reasons: 

• Because m changes 

• Because M changes 

Another way of phrasing this is that the information that 
the ML-method extracts from the parameters comes from 
two sources: the parameter dependence of m and that of 
M. In 1^, it was found that for our case, as long as the 
parameters were well constrained (as is anticipated for 
both MAP and Planck |^), virtually all the information 
came from m, not from M. This means that, in the 
Gaussian approximation, the likelihood curve near the 
maximum is well approximated by 
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L oc [detM]~i/2exp 



1 



(y-m)*M-i(y-m) 



exp 



i(y-m)*M-i(y-m) 



(54) 



where M is a merely a constant matrix evaluated some- 
where near the best fit point in parameter space. Maxi- 
mizing this is of course amounts to making a simple chi- 
squared model fit to the plotted power spectrum, and 
requires no overly tedious numerical calculations at all: 
the mean vector predicted by a given model is simply 
given by equation (p^), e.g., by the Fisher matrix (which 
is computed once and for all) times the model power 
spectrum. If the best fit power spectrum turns out to 
be so different from the one assumed when computing F 
that the accuracy of equation ( |5^ ) comes into doubt, the 
best fit parameter values can of course be used to repeat 
the entire procedure iteratively. Needless to say, the ap- 
proach outlined in this section needs to be implemented 
and tested in considerable detail before any strong state- 
ments can be made about its feasibility. For instance, 
there is no guarantee that the chi-squared method (nor, 
for that matter, the ML-method) will give unbiased pa- 
rameter estimates, so Monte-Carlo calibrations are nec- 
essary in either case. The fact that the multipole es- 
timates with £ ^ 50 are non-Gaussian also means that 
chi-squared parameter estimates made with the Gaussian 
expression of equation ( ^^ are likely to give error bars 
on the parameters that are slightly above the theoreti- 
cal minimum. We merely conclude this section by saying 
that it appears plausible that a simple chi-squared ap- 
proach in the final step of Figure 1 will give error bars 
on cosmological parameters that are almost as small as 
theoretically possible. 



VI. CONCLUSIONS 

We have presented a new method for power spectrum 
estimation from CMB maps and argued that it is the best 
choice for the box labeled E in Figure 1: 

• Just as the mapmaking step above it, it can com- 
press the data set by a large factor while retaining 
all the cosmological information. 

• It is simple enough to be computationally feasible 
in practice even for future megapixel sky maps. 

• The statistical properties of the power spectrum es- 
timates are straightforward to compute, and using 
a simple chi-squared parameter fitting approach in 
the bottom box in the figure is likely to give error 
bars on the parameters that are almost as small as 
theoretically possible. 

We illustrated the method details by applying it to the 
4 year COBE/DMR data. It roughly speaking involves 
subjecting the map to a high-pass filter and some edge 



softening, and then analyzing the resulting map with the 
Hauser-Peebles method, i.e., with a straight expansion in 
truncated spherical harmonics. It reduces to the Hauser- 
Peebles method in the limit of zero signal-to-noise. Both 
of these methods require approximately the same amount 
of CPU time. When the signal-to-noise is very high, it 
becomes quite similar to the maximum spectral resolu- 
tion method [p^ . However, even in this limit, it has the 
advantage (in addition to being strictly lossless, which 
the method presented in ||l^ is not) that it does not re- 
quire the solution of an eigenvalue problem, thus being 
much faster. 

It has recently been shown that there is a mapmaking 
method that is both lossless [|l^ and computationally fea- 
sible ll^. Combining that with the present results, we 
conclude that all aspects of the "main tube" of the data 
analysis pipeline are now under control, making it plausi- 
ble that future CMB missions can deliver the promise of 
accurate measurements of cosmological parameters not 
merely in principle but also in practice, without floun- 
dering on computational difficulties, if all other aspects 
of the problem are as simple as possible. By this last 
caveat, we mean the following rosy scenario: 

• There are no unforeseen systematic errors. 

• Non-Gaussian and anisotropic foregrounds can be 
removed down to a tolerable level already in the 
map stage. 

• The CMB fluctuations are Gaussian. 

• The true model will turn out be be something sim- 
ilar to what we expect, so that the power spectrum 
really contains information about our classical cos- 
mological parameters. 

All of these issues require substantial amounts on work 
before we can trust our fledgling data analysis pipeline 
in Figure 1 to be able to make the most of the real CMB 
data sets that await us. 

The author wishers to thank Andrew Hamilton for 
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by NASA through a Hubble Fellowship, #HF-01084.01- 
96A, awarded by the Space Telescope Science Institute, 
which is operated by AURA, Inc. under NASA contract 
NAS5-26555. The COBE data sets were developed by the 
NASA Goddard Space Flight Center under the guidance 
of the COBE Science Working Group and were provided 
by the NSSDC. 



APPENDIX: MONOPOLE AND DIPOLE 
REMOVAL 

Since the monopole and the kinematic dipolc of the 
CMB are orders of magnitude larger than the multipoles 
of cosmological interest (£ > 2), one customarily removes 
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them from the map x prior to any subsequent analysis. 
Indeed, the monopole cannot even be measured by dif- 
ferential experiments (such as COBE/DMR). 



3. Follow our method to the letter, but in place of 
(which is undefined), use the "pseudo-inverse" 
of C, the matrix defined as 



A. The problem 



M = n 



C + jfLT} 



(58) 



Suppose that all multipoles £ < £o are removed from 
the data set. This means that the data set at our dispo- 
sition, say X, is given by 



nx, 



(55) 



where 11 is a projection matrix that projects onto the 
subspace orthogonal to these multipoles. If the columns 
of a matrix Z form an orthonormal (Z*Z = I) basis for 
the space of these unwanted multipoles^, then 

n = I-ZZ*. (56) 

The covariance matrix of the available data is therefore 



ncn* 



(57) 



Since Z has £q columns, 11 and C have rank n' = n — £q. 
Hence C is singular and non-invertible, and the method 
we have presented cannot be applied in its most straight- 
forward implementation. This problem occurs simply be- 
cause the numbers in x are not independent, i.e., since 
£q of them can be expressed as linear combinations of the 
others. 



B. Three solutions 

There are three different ways of incorporating this 
complication into our analysis: 

1. Simply throw away £q of the data points in x. 
All the remaining data points are independent, so 
the resulting covariance matrix (C with the corre- 
sponding rows and columns removed) is invertible. 
The drawback of this approach is that the covari- 
ance matrix tends to be poorly conditioned, which 
may give numerical difficulties if n is very large. 

2. Use C rather than C for the calculations, but 
choose a fiducial power spectrum where Ci is very 
large for £ < £q. Then the method itself will remove 
the unwanted multipoles with great accuracy. This 
approach is of course not strictly correct, but is 
easy to implement and an excellent approximation 
in most realistic cases. 



^Such a matrix Z is readily constructed by starting with 
the unwanted columns in the multipole matrix Y (the first 
4 columns if io = 2) and orthonormalizing them with Gram- 
Schmidt or Cholesky procedure ]ict . 



C. The pseudo-inverse approach 

We will now show that the definition of M is inde- 
pendent of the choice of the constant rj and that 
this pseudo-inverse approach gives the desired optimal 
results. Let R be an orthonormal nxn matrix whose first 
£q columns equal the columns of Z. In other words, the 
columns of R form an orthonormal basis and split into 
two blocks, corresponding to the unwanted and wanted 
multipoles, respectively. In this new basis, the projection 
matrix 11 takes the simple form 



R*nR 




I 



(59) 



where the upper left block in this nxn matrix has size 
£qx£q, etc. In the same basis, C therefore takes the form 



R*CR = (R*nR)(R*CR)(R*nR) 





c. 



(60) 



for some non-singular n' x n' matrix C*. It thus follows 
that 



R*MR=(R*nR) R*CR + r7(R*ZZ*R) 





I 



I 





c, 

r]-^I 

c-1 



I 



I 



R*n*R 




I 





c-1 



(61) 



independent of rj. Combining the last three equations, we 
see that MC = CM = n, which implies that n[MC - 
I]n* — 0. Thus although there cannot be a matrix M 
such that MC — 1 = 0, our choice of M comes close to 
being an inverse in the sense that the equation MC — 
1 = becomes true when you project out the unwanted 
multipoles. 

Let us now show that the pseudo-inverse approach is 
lossless. Since 



R*x= (R*nR)(R*x) = 



(62) 



for some n'-dimensional vector a, we can compute the 
Fisher matrix directly from a, since it clearly contains 
all the information. Its covariance matrix is (aa*) = C*, 
whose derivatives are given by 





ac. 



R' 



ac 



R = R'np'^nR. 



(63) 
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Since HP = for f < j we of course have no informa- 
tion about these muhipoles. The power spectrum Fisher 
matrix is given by 



-tr 



^-1 Q-i 



= -tr 
2 

1 

= -tr 
2 



dCi * dCt 
(R*MR) (R*nP^nR) (R*MR) (R*nP^ HR) 



MP'^MP^ 



(64) 



where in the last step, we used that RR* — I and Mil = 
nM = M. Using MC = 11, the covariance matrix of 
equation (EOh reduces to 



CMP'^MCMP'^ M 



MP^MP*^ 



(65) 



so the argument of Section [II E shows that the smaUest 
possible error bars are attained. 



D. The pseudo-inverse in practice 



Since Z contains merely a few columns, computing M 
using equation ( psf ) would be virtually as fast as com- 
puting C""'^, and neither Cholesky decomposition nor it- 
erative methods for computing the map z — Mx suffer 
any noticeable speed loss because of the monopole/dipole 
complication. For instance, for the simple case — 1, 
the pseudo-inverse is computed by simply 

1. subtracting the mean from all rows and columns of 
the matrix, 

2. adding the constant 77 to all the matrix elements, 

3. inverting the matrix, and 

4. again subtracting the mean from all rows and 
columns. 

Note that rj need not be small. As the result is inde- 
pendent of ?7, it is numerically advantageous to make the 
monopole comparable to other eigenvalues, which cor- 
responds to chosing r/ = a/n, where a is the order of 
magnitude of a typical matrix element .F] 



* This important special case Iq = 1 also applies to the 
related problem of making CMB maps (symbolized by the box 
marked W in Figure 1) from differential measurents, where 
one needs to "invert" a matrix which is singular because its 
rows and columns have zero mean. It is easy to show that 
for the COBE mapmaking method, the lossless map is 
obtained by using the pseudo-inverse described here. When 



making their maps [|rj, the COBE/DMR team regularized 
the inversion by adding Vjl to their matrix, choosing t) very 
small. A better way of is therefore to add r) to all the n x n 
matrix elements, not merely to the diagonal, and to choose 
rj — a/n rather than infinitesimal. 
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TABLE L The COBE/DMR power spectrum 

5T = [e{£+l)Ce/2TvY^^ in fiK. 
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FIG. 1. Power spectrum estimation as an intermediate data-compression step in converting the raw data from a future 
megapixel experiment to measurements of cosmological parameters. If all three data compression steps are lossless, then this 
data analysis pipeline will measure the parameters with just as small error bars as a (computationally unfeasible) likelihood 
analysis measuring the parameters directly from the time-ordered data. 
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FIG. 2. The power spectrum estimated from the COBE/DMR 4 year data. 
The observed multipoles Dg = £{£ + l)Ct are plotte d with 1 — cr error bars using our basic method (top) and made 
uncorrelated with the F^/^-method of Section VA6 (bottom). The vertical error bars include both pixel noise and 
cosmic variance, and the horizontal bars show the width of the window functions used. If the true power spectrum is 
given by n = 1 and Qrms,ps = 18.4/iK (the heavy horizontal line), then the shaded region gives the 1 — a error bars 
and the dark-shaded region shows the contribution from cosmic variance. 
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FIG. 3. How the window functions depend on the target multipole. 
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FIG. 4. How the window functions depend on the sky coverage. 
As the sky coverage decreases, the window function widens from a Kronecker delta to have a width A£ ~ 1/A9. 
In addition, since the custom cut is almost symmetric, it approximately preserves the orthogonality of even and odd 
multipoles. 
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FIG. 5. Sky maps before and after power tilting. 
The 4- year COBE/DMR data (top) is shown after multipHcation by the inverse pixel covariance matrix correspond- 
ing to the actual noise level (middle) and corresponding to the noise- level projected for Planck (bottom). 
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FIG. 6. Window functions before and after power tilting. 
Same as the previous figure, but in the Fourier (multipole) domain, showing the corresponding window functions 
for = 30. 
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FIG. 7. The i, = 10 window functions for three of the data-presentation methods discussed, corresponding to using the 
original y, Cholesky decomposing the Fisher matrix (F = LL*) and taking its square root, respectively. 
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FIG. 8. The power spectrum observed by COBE/DMR binned into 8 bands and compared with other experiments. 
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